Search CORE

26 research outputs found

Low-Rank Softmax Can Have Unargmaxable Classes in Theory but Rarely in Practice

Author: Bogoychev Nikolay
Grivas Andreas
Lopez Adam
Publication venue
Publication date: 21/03/2022
Field of study

Classifiers in natural language processing (NLP) often have a large number of output classes. For example, neural language models (LMs) and machine translation (MT) models both predict tokens from a vocabulary of thousands. The Softmax output layer of these models typically receives as input a dense feature representation, which has much lower dimensionality than the output. In theory, the result is some words may be impossible to be predicted via argmax, irrespective of input features, and empirically, there is evidence this happens in small language models (Demeter et al., 2020). In this paper we ask whether it can happen in practical large language models and translation models. To do so, we develop algorithms to detect such unargmaxable tokens in public models. We find that 13 out of 150 models do indeed have such tokens; however, they are very infrequent and unlikely to impact model quality. We release our algorithms and code to the public

arXiv.org e-Print Archive

Edinburgh Research Explorer

Fast machine translation on parallel and massively parallel hardware

Author: Bogoychev Nikolay Veselinov
Publication venue: The University of Edinburgh
Publication date: 01/07/2019
Field of study

Parallel systems have been widely adopted in the field of machine translation, because the raw computational power they offer is well suited to this computationally intensive task. However programming for parallel hardware is not trivial as it requires redesign of the existing algorithms. In my thesis I design efficient algorithms for machine translation on parallel hardware. I identify memory accesses as the biggest bottleneck to processing speed and propose novel algorithms that minimize them. I present three distinct case studies in which minimizing memory access substantially improves speed: Starting with statistical machine translation, I design a phrase table that makes decoding ten times faster on a multi-threaded CPU. Next, I design a GPU-based n-gram language model that is twice as fast per £ as a highly optimized CPU implementation. Turning to neural machine translation, I design new stochastic gradient descent techniques that make end-to-end training twice as fast. The work in this thesis has been incorporated in two popular machine translation toolkits: Moses and Marian

Edinburgh Research Archive

Character Mapping and Ad-hoc Adaptation: Edinburgh's IWSLT 2020 Open Domain Translation System

Author: Bogoychev Nikolay
Chen Pinzhen
Germann Ulrich
Publication venue
Publication date: 01/01/2020
Field of study

This paper describes the University of Edinburgh’s neural machine translation systems submitted to the IWSLT 2020 open domain Japanese Chinese translation task. On top of commonplace techniques like tokenisation and corpus cleaning, we explore character mapping and unsupervised decoding-time adaptation. Our techniques focus on leveraging the provided data, and we show the positive impact of each technique through the gradual improvement of BLEU

Crossref

Edinburgh Research Explorer

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Fast and highly parallelizable phrase table for statistical machine translation

Author: Bogoychev Nikolay
Hoang Hieu
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 12/08/2016
Field of study

Edinburgh Research Explorer

Combining Global Sparse Gradients with Local Gradients in Distributed Neural Network Training

Author: Aji Alham Fikri
Bogoychev Nikolay
Heafield Kenneth
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2019
Field of study

Crossref

Edinburgh Research Explorer

N-gram language models for massively parallel devices

Author: Bogoychev Nikolay
Lopez Adam
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 12/08/2016
Field of study

Edinburgh Research Explorer

An Open Dataset and Model for Language Identification

Author: Birch Alexandra
Bogoychev Nikolay
Burchell Laurie
Heafield Kenneth
Publication venue
Publication date: 23/05/2023
Field of study

Language identification (LID) is a fundamental step in many natural language processing pipelines. However, current LID systems are far from perfect, particularly on lower-resource languages. We present a LID model which achieves a macro-average F1 score of 0.93 and a false positive rate of 0.033 across 201 languages, outperforming previous work. We achieve this by training on a curated dataset of monolingual data, the reliability of which we ensure by auditing a sample from each source and each language manually. We make both the model and the dataset available to the research community. Finally, we carry out detailed analysis into our model's performance, both in comparison to existing open models and by language class.Comment: To be published in ACL 202

arXiv.org e-Print Archive

Edinburgh Research Explorer

In Neural Machine Translation, What Does Transfer Learning Transfer?

Author: Aji Alham Fikri
Bogoychev Nikolay
Heafield Kenneth
Sennrich Rico
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2020
Field of study

Transfer learning improves quality for low-resource machine translation, but it is unclear what exactly it transfers. We perform several ablation studies that limit information transfer, then measure the quality impact across three language pairs to gain a black-box understanding of transfer learning. Word embeddings play an important role in transfer learning, particularly if they are properly aligned. Although transfer learning can be performed without embeddings, results are sub-optimal. In contrast, transferring only the embeddings but nothing else yields catastrophic results. We then investigate diagonal alignments with auto-encoders over real languages and randomly generated sequences, finding even randomly generated sequences as parents yield noticeable but smaller gains. Finally, transfer learning can eliminate the need for a warm-up phase when training transformer models in high resource language pairs

Crossref

Edinburgh Research Explorer

ZORA